Learningtower: Comparative Analysis of PISA 2022 and Historical Data

Shabarish Sai and Guan Ru Chen

Department of Econometrics and Business Statistics

2024-10-18

Contributors

  • Shabarish Sai Subramanian

  • Guan Ru Chen

  • Dianne Cook

  • Kevin Y.X. Wang

  • Priya Ravindra Dingorkar

Introduction

The learningtower R package is designed to streamline the analysis of OECD’s Programme for International Student Assessment (PISA) data. This package provides access to datasets from 2000 to 2022, allowing researchers to explore trends in education, student performance, and other contextual factors. It simplifies the process of handling large, complex datasets, making it easier to conduct comparative studies across countries and years. Currently, we are updating the 2022 version of the learningtower package to ensure compatibility with the latest PISA data and functionalities

Collection of Data

PISA data is collected every three years from over 70 countries, targeting 15-year-old students. The assessment measures students’ abilities in reading, mathematics, and science through standardized tests. In addition to the tests, questionnaires are administered to students, teachers, and school principals to gather contextual data on educational environments, socio-economic status, and more. This comprehensive approach helps provide insights into factors that affect student performance across different educational systems worldwide.

PISA Dataset

Code
student_data <- readRDS("../Data/student_2022.rds")
print(colnames(student_data))
 [1] "year"        "country"     "school_id"   "student_id" 
 [5] "mother_educ" "father_educ" "gender"      "computer"   
 [9] "internet"    "math"        "read"        "science"    
[13] "stu_wgt"     "desk"        "room"        "dishwasher" 
[17] "television"  "computer_n"  "laptop_n"    "car"        
[21] "book"        "wealth"      "escs"       

The student dataset includes the following columns: year, country, school_id, student_id, mother_educ, father_educ, gender, computer, internet, math, read, science, stu_wgt, desk, room, dishwasher, television, computer_n, laptop_n, car, book, wealth, escs, and curiosity. These columns provide comprehensive details about the students’ background, academic performance, and access to resources, offering a robust dataset for analysis of educational outcomes and socio-economic factors.

Gender Gap Analysis

Math

Code
load(here("data/math_diff_conf_intervals.rda"))
math_diff_conf_intervals <- math_diff_conf_intervals %>%
  dplyr::filter(country_name %in%
                  c("Australia",
                    "New Zealand",
                    "Japan",
                    "Singapore",
                    "Saudi Arabia",
                    "Turkey",
                    "United States",
                    "Finland",
                    "Ukraine",
                    "Brazil",
                    "Argentina",
                    "Morocco")) 
  
math_plot <- ggplot(math_diff_conf_intervals,
                    aes(diff, country_name,
                        col = score_class)) +
  scale_colour_manual("",
      values = c("boys"="#3288bd",
                 "nodiff"="#969696",
                 "girls"="#f46d43")) +
  geom_point() +
  geom_errorbar(aes(xmin = lower, xmax = upper), width=0) +
  geom_vline(xintercept = 0, color = "#969696") +
  labs(y = "",
  x = "",
  title = "Math"
  ) +
  theme(legend.position="none") +
  annotate("text", x = 50, y = 1, label = "Girls") +
  annotate("text", x = -50, y = 1, label = "Boys") +
  scale_x_continuous(limits = c(-70, 70),
                     breaks = seq(-60, 60, 20),
                     labels = abs(seq(-60, 60, 20)))
math_plot

Explanation

With the gender difference in average maths scores (measured as girls’ scores - boys’ scores) on the x-axis, this graphic displays the gender gap analysis in mathematics across several nations. The y-axis lists the countries, and the lines indicate confidence intervals, and each point displays the average score difference. Grey points indicate no discernible gender difference, red points emphasise nations where girls outperform boys, and blue points indicate nations where boys exceed girls. The graph illustrates the different degrees of gender inequality in maths ability, with boys outperforming girls in many nations and the opposite tendency in a small number.

Reading

Code
load(here("data/read_diff_conf_intervals.rda"))
read_diff_conf_intervals <- read_diff_conf_intervals %>%
  dplyr::filter(country_name %in%
                  c("Australia",
                    "New Zealand",
                    "Japan",
                    "Singapore",
                    "Saudi Arabia",
                    "Turkey",
                    "United States",
                    "Finland",
                    "Ukraine",
                    "Brazil",
                    "Argentina",
                    "Morocco")) 
  
read_plot <- ggplot(read_diff_conf_intervals,
                    aes(diff, country_name,
                        col = score_class)) +
  scale_colour_manual("",
      values = c("boys"="#3288bd",
                 "nodiff"="#969696",
                 "girls"="#f46d43")) +
  geom_point() +
  geom_errorbar(aes(xmin = lower, xmax = upper), width=0) +
  geom_vline(xintercept = 0, color = "#969696") +
  labs(y = "",
  x = "",
  title = "Reading"
  ) +
  theme(legend.position="none") +
  annotate("text", x = 50, y = 1, label = "Girls") +
  annotate("text", x = -50, y = 1, label = "Boys") +
  scale_x_continuous(limits = c(-70, 70),
                     breaks = seq(-60, 60, 20),
                     labels = abs(seq(-60, 60, 20)))
read_plot

Explanation

An analysis of the gender gap in reading scores across several nations is shown in this graph. The gender gap in average reading scores is shown by the x-axis, which is computed as (Girls’ scores - Boys’ scores). The lines display the bootstrap confidence intervals, and the y-axis lists the nations. Each point on the y-axis reflects the average gender gap in reading performance. The red dots and lines illustrate that, in the majority of countries, girls perform significantly better than boys in reading, with scores veering towards positive values. The global pattern where girls tend to score higher on reading examinations is highlighted by the vertical zero line, which indicates no difference, and the fact that few countries display boys outperforming girls in reading.

Science

Code
load(here("data/sci_diff_conf_intervals.rda"))
sci_diff_conf_intervals <- sci_diff_conf_intervals %>%
  dplyr::filter(country_name %in%
                  c("Australia",
                    "New Zealand",
                    "Japan",
                    "Singapore",
                    "Saudi Arabia",
                    "Turkey",
                    "United States",
                    "Finland",
                    "Ukraine",
                    "Brazil",
                    "Argentina",
                    "Morocco")) 
  
sci_plot <- ggplot(sci_diff_conf_intervals,
                    aes(diff, country_name,
                        col = score_class)) +
  scale_colour_manual("",
      values = c("boys"="#3288bd",
                 "nodiff"="#969696",
                 "girls"="#f46d43")) +
  geom_point() +
  geom_errorbar(aes(xmin = lower, xmax = upper), width=0) +
  geom_vline(xintercept = 0, color = "#969696") +
  labs(y = "",
  x = "",
  title = "Science"
  ) +
  theme(legend.position="none") +
  annotate("text", x = 50, y = 1, label = "Girls") +
  annotate("text", x = -50, y = 1, label = "Boys") +
  scale_x_continuous(limits = c(-70, 70),
                     breaks = seq(-60, 60, 20),
                     labels = abs(seq(-60, 60, 20)))
sci_plot

Explanation

This graph presents a Gender Gap Analysis in science scores across various countries, showing the difference between girls’ and boys’ average science scores. The x-axis represents the gender difference, calculated as( Girl’s scores - Boy’s Scores), while the y-axis lists the countries. The red points and lines indicate that girls outperform boys in science in several countries, while blue points and lines indicate that boys outperform girls. Grey points and lines represent countries where there is no significant gender difference. The vertical line at zero shows no difference, making it easy to see that in most countries, girls tend to perform better than boys in science, as shown by the positive values on the right side of the chart.

Overall Comparison

World Map

Code
ggplotly(mrs_maps)

Explanation for Gender Gap by Countries

The picture displays three global maps that illustrate the gender gap scores for three subjects—math, science, and reading—across various geographical locations. The gender gap value is shown by the colour gradient, where greater values are indicated by darker red and lower values by darker green. Each map shows the gender disparity in schooling in different regions, with notable differences between continents. For example, reading displays more red, indicating a wider gender disparity favouring one gender over the other, but maths exhibits green in many places of the world, indicating fewer gender gaps. Though there are some noticeable regional variations, the scientific map looks comparable to the maths map.

EcoSocio Factors Analysis

Temporal Analysis Analysis

Comparison

Explanation for Temporal Analysis

From 2000 to 2022, the three charts show the temporal trends of math, reading, and science student performance scores in various nations. Labels are used to draw attention to certain countries’ performance trends, and each line shows the average score for that nation.

  • Mathematics: Singapore routinely ranks top, whereas Brazil and Peru have lower scores with some positive trends. Around the 500 score point, nations like Belgium, Australia, and Germany continue to perform comparatively steadily.

  • Reading: Australia, Belgium, and Canada continue to do well, while Singapore once again takes the lead. Thailand, Brazil, and Peru perform worse, though they gradually become better.

  • Science: Australia, Germany, and Belgium retain mid-range ratings, while Singapore and Canada perform at the top. Despite having lower scores, Brazil and Peru have shown some development.

Limitations

Although the Learningtower package makes it easier to access the PISA dataset, it has drawbacks, including less customisation options for more complex analyses, performance problems with huge datasets, and possible incompatibilities with other R versions. Furthermore, for more complicated use scenarios, the documentation might not be adequate. Inherent limitations of the PISA dataset include the fact that it is cross-sectional, which precludes longitudinal tracking, the possibility of sample biases, and the difficulties caused by linguistic and cultural differences that could compromise the comparability of results. Furthermore, the depth of analysis may be constrained by incomplete or missing data, a lack of socioeconomic indicators, and out-of-date background questionnaires. Lastly, the emphasis on standardised test scores may obscure more important educational objectives that the tests do not measure, such creativity and critical thinking.

Thank You